library(RSQLite)
library(dbplyr)
package ‘dbplyr’ was built under R version 3.6.2Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
library(janitor)
package ‘janitor’ was built under R version 3.6.2
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’:
chisq.test, fisher.test
library(lubridate)
package ‘lubridate’ was built under R version 3.6.2
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
library(datasets)
library(ggthemes)
library(gganimate)
package ‘gganimate’ was built under R version 3.6.2Loading required package: ggplot2
package ‘ggplot2’ was built under R version 3.6.2
library(modelr)
package ‘modelr’ was built under R version 3.6.2
library(broom)
package ‘broom’ was built under R version 3.6.2
Attaching package: ‘broom’
The following object is masked from ‘package:modelr’:
bootstrap
library(ggfortify)
package ‘ggfortify’ was built under R version 3.6.2
library(infer)
package ‘infer’ was built under R version 3.6.2
library(MASS)
package ‘MASS’ was built under R version 3.6.2
library(tseries)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
‘tseries’ version: 0.10-47
‘tseries’ is a package for time series analysis and computational
finance.
See ‘library(help="tseries")’ for details.
library(forecast)
package ‘forecast’ was built under R version 3.6.2Registered S3 methods overwritten by 'forecast':
method from
autoplot.Arima ggfortify
autoplot.acf ggfortify
autoplot.ar ggfortify
autoplot.bats ggfortify
autoplot.decomposed.ts ggfortify
autoplot.ets ggfortify
autoplot.forecast ggfortify
autoplot.stl ggfortify
autoplot.ts ggfortify
fitted.ar ggfortify
fortify.ts ggfortify
residuals.ar ggfortify
library(fable)
package ‘fable’ was built under R version 3.6.2Loading required package: fabletools
package ‘fabletools’ was built under R version 3.6.2
Attaching package: ‘fabletools’
The following objects are masked from ‘package:forecast’:
accuracy, forecast
The following object is masked from ‘package:infer’:
generate
library(fabletools)
library(tsibble)
package ‘tsibble’ was built under R version 3.6.2
Attaching package: ‘tsibble’
The following object is masked from ‘package:lubridate’:
interval
library(tsibbledata)
package ‘tsibbledata’ was built under R version 3.6.2
library(feasts)
package ‘feasts’ was built under R version 3.6.2
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
✓ tibble 3.0.3 ✓ dplyr 1.0.2
✓ tidyr 1.1.2 ✓ stringr 1.4.0
✓ readr 1.3.1 ✓ forcats 0.5.0
✓ purrr 0.3.4
package ‘tibble’ was built under R version 3.6.2package ‘tidyr’ was built under R version 3.6.2package ‘purrr’ was built under R version 3.6.2package ‘dplyr’ was built under R version 3.6.2── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
x lubridate::as.difftime() masks base::as.difftime()
x broom::bootstrap() masks modelr::bootstrap()
x lubridate::date() masks base::date()
x dplyr::filter() masks stats::filter()
x dplyr::ident() masks dbplyr::ident()
x lubridate::intersect() masks base::intersect()
x tsibble::interval() masks lubridate::interval()
x dplyr::lag() masks stats::lag()
x dplyr::select() masks MASS::select()
x lubridate::setdiff() masks base::setdiff()
x dplyr::sql() masks dbplyr::sql()
x lubridate::union() masks base::union()
library(leaflet)
Registered S3 methods overwritten by 'htmltools':
method from
print.html tools:rstudio
print.shiny.tag tools:rstudio
print.shiny.tag.list tools:rstudio
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
# Connecting
conn <- dbConnect(SQLite(), "raw_data/FPA_FOD_20170508.sqlite")
# Pulling all the names of the tables in the database file
as.data.frame(dbListTables(conn))
# Making fires dataframe
fires <- tbl(conn, "Fires") %>% collect()
# EPSG worldwide geodetic parameter dataset system
spatial_ref <- tbl(conn, "spatial_ref_sys_all") %>% collect()
# National Wildfire Coordinating Group unit abbreviations
NWGG <- tbl(conn, "NWCG_UnitIDActive_20170109") %>% collect()
# Disconnect
dbDisconnect(conn)
fires_small <- fires %>%
select(NWCG_REPORTING_AGENCY, SOURCE_REPORTING_UNIT_NAME, FIRE_NAME,
FIRE_YEAR, DISCOVERY_DATE, DISCOVERY_DOY, DISCOVERY_TIME, CONT_DATE,
CONT_DOY, CONT_TIME, STAT_CAUSE_CODE, STAT_CAUSE_DESCR, FIRE_SIZE,
FIRE_SIZE_CLASS, LATITUDE, LONGITUDE, OWNER_CODE, OWNER_DESCR, STATE,
COUNTY, FIPS_CODE, FIPS_NAME, Shape)
fires_small <- clean_names(fires_small)
fires_small <- fires_small %>%
mutate(nwcg_reporting_agency = as.factor(nwcg_reporting_agency)) %>%
mutate(stat_cause_code = as.factor(stat_cause_code)) %>%
mutate(fire_size_class = as.factor(fire_size_class)) %>%
mutate(owner_descr = as.factor(owner_descr)) %>%
mutate(state = as.factor(state))
fires_small <- fires_small %>%
mutate(date_origin = as.Date(paste0(fire_year, "-01-01"))) %>%
mutate(discovery_date = as.Date(discovery_doy, origin = date_origin)) %>%
mutate(discovery_moy = month(discovery_date, label = TRUE)) %>%
select(-date_origin)
year_plot <- fires_small %>%
group_by(fire_year) %>%
summarise(num_fires =n())
`summarise()` ungrouping output (override with `.groups` argument)
year_plot %>%
ggplot +
aes(x = fire_year, y = num_fires) +
geom_point() +
geom_line() +
ylim(0, 120000) +
ggtitle("Amount of fires per year 1992-2015\n") +
xlab("\nYear") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
There is a lot of variation in the data between years. Visually it looks like a repeating pattern is occurring every 5 years or so with 4 peaks visible within this reporting period. Having looked at the historic weather for that date range these peaks seems to coincide with recorded heatwaves in 2000, 2006 and 2011.(1)
https://en.wikipedia.org/wiki/List_of_heat_waves
fires_small %>%
mutate(year_month = make_date(fire_year, discovery_moy)) %>%
group_by(year_month) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = year_month, y = num_fires) +
geom_line(col = "dark blue") +
ggtitle("Amount of fires per month 1992-2015\n") +
xlab("\nYear") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
Peaks are still shown to be occurring in the summers. The 2006 heatwave is especially visible.
monthly <- fires_small %>%
mutate(year_month = make_date(fire_year, discovery_moy)) %>%
group_by(year_month) %>%
summarise(num_fires = n())
`summarise()` ungrouping output (override with `.groups` argument)
write_csv(monthly, path = "clean_data/monthly.csv")
monthly
Continued on seperate worksheet
fires_small %>%
group_by(discovery_date) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = discovery_date, y = num_fires) +
geom_line(col = "dark blue") +
ggtitle("Amount of fires per day 1992-2015\n") +
xlab("\nYear") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
This shows a typical time series plot with a cyclic variation due to warmer weather in the summer time.
fires_small %>%
group_by(discovery_doy) %>%
summarise(num_fires = n()) %>%
ggplot(aes(x = discovery_doy, y = num_fires)) +
geom_line(col = "dark blue") +
ggtitle("Amount of fires per day of year, for combined years 1992-2015\n") +
xlab("\nDay of Year") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
The are peaks around day 60-110 and a big peak around 180.
fires_small %>%
group_by(discovery_doy) %>%
summarise(num_fires = n()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
The 2 highest days of the year are on 185 and 186, which happens to be Independence Day (4th July) on a normal year and a leap year retrospectively. So I imagine most of the extra fires (literally over double the normal amount) are caused by fireworks.
fires_small %>%
group_by(discovery_moy) %>%
summarise(num_fires = n()) %>%
ggplot(aes(x = discovery_moy, y = num_fires)) +
geom_col(fill = "dark blue", col = "black") +
ggtitle("Amount of fires per month of year, for combined years 1992-2015\n") +
xlab("\nMonth of Year") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
There are 2 definite peaks during the year. March and April are possibly due to the US “Spring Break”, where schools and Universities are stopped and so families are likely to be on vacation during that period possibly visiting National Parks. July and August is also Summer Break for school with both families visiting Parks and hot weather likely causes of fire outbreaks. I will have a look at the data and see if there is anything related to vacation time that could corroborate this.
https://en.wikipedia.org/wiki/School_holidays_in_the_United_States
mar_apr_causes <- fires_small %>%
filter(discovery_moy == "Mar" | discovery_moy == "April") %>%
group_by(stat_cause_descr) %>%
summarise(num_fires_mar_apr = n()) %>%
arrange(desc(num_fires_mar_apr))
`summarise()` ungrouping output (override with `.groups` argument)
may_june_causes <- fires_small %>%
filter(discovery_moy == "May" | discovery_moy == "June") %>%
group_by(stat_cause_descr) %>%
summarise(num_fires_may_june = n()) %>%
arrange(desc(num_fires_may_june))
`summarise()` ungrouping output (override with `.groups` argument)
mar_apr_causes$num_fires_may_june <- may_june_causes$num_fires_may_june
mar_apr_causes
Although there is a large increase in fires due to debris burning, arson and miscellaneous in March and April, in May and June the number of fires by children, campfire and smoking is all higher so there is no definite proof that school breaks are to cause for the spikes in March and April each year over the whole data period.
options(scipen = 999)
fires_small %>%
group_by(stat_cause_descr) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(reorder(x = stat_cause_descr, num_fires), y = num_fires) +
geom_col(fill = "dark blue") +
coord_flip() +
ggtitle("Amount of fires by cause, for combined years 1992-2015\n") +
xlab("Cause of fire\n") +
ylab("\nNumber of Fires") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
NA
fires_small %>%
group_by(stat_cause_descr) %>%
summarise(avg_size = mean(fire_size)) %>%
ggplot +
aes(reorder(x = stat_cause_descr, avg_size), y = avg_size) +
geom_col(fill = "dark blue") +
coord_flip() +
ggtitle("Average size of fire per cause, for combined years 1992-2015\n") +
xlab("Cause of fire\n") +
ylab("\nSize of fire (Square Miles)") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
fires_small %>%
summarise(num_na = sum(is.na(cont_date)))
Literally half the data is missing for burn time, making it very difficult to do any meaningful analysis so I shall not consider burn time at this time.
fires_small %>%
group_by(fire_size_class) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_size_class, y = num_fires, fill = fire_size_class) +
geom_col(col = "black") +
scale_fill_manual(values = c("#016450", "#02818a", "#3690c0", "#67a9cf",
"#a6bddb", "#d0d1e6","#f6eff7"),
name = "Fire Size Classification",
breaks = c("A", "B", "C", "D", "E", "F", "G"),
labels = c("A: < 1/4 acre", "B: 1/4 to 10 acres",
"C: 10 to 100 acres", "D: 100 to 300 acres",
"E: 300 to 1000 acres", "F: 1000 to 5000 acres",
"G: More than 5000 acres")) +
ggtitle("Amount of fires types in the years 1992-2015\n") +
xlab("\nFire Size Classification") +
ylab("Number of Fires\n") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
NA
NA
geom_polygon(), coord_map() along with the ggthemes theme_map() functions.datasets package which includes various bits of information on the US States, including coordinates for state boundaries.# State boundary co-ordinates from 'datasets' package
state_map <- map_data("state")
state_map
state.abb
[1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA"
[16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ"
[31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT"
[46] "VA" "WA" "WV" "WI" "WY"
state.name
[1] "Alabama" "Alaska" "Arizona" "Arkansas"
[5] "California" "Colorado" "Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"
state_list <- tibble(state = state.abb, state_name = state.name)
state_list
state_map dataframe is in lower case and has the column name ‘region’. I shall change the state_list tibble to be the same format so they can be joined together.state_list <- tibble(state = state.abb, region = tolower(state.name))
state_list to fires_small datasetsfires_states <- fires_small %>%
left_join(state_list, by = "state")
fires_states
fires_states %>%
filter(is.na(region))
states_list tibble.States tibble originally. PR is Puerto Rico and is also not a state but the largest US territory.# Adding 2 new states
state.abb <- append(state.abb, c("DC", "PR"))
state.name <- append(state.name, c("District of Columbia", "Puerto Rico"))
state_list <- tibble(state = state.abb, region = tolower(state.name))
# Re-joing tibbles
fires_states <- fires_small %>%
left_join(state_list, by = "state")
# Checking the join has worked properly and there are no NAs
fires_states %>%
filter(is.na(region))
Warning in `[<-.data.frame`(`*tmp*`, is_list, value = list(`23` = "<S3: blob>")) :
replacement element 1 has 1 row to replace 0 rows
# Code below brings up a "vector memory exhausted (limit reached?)" error
# fires_joined <- fires_states %>%
# right_join(state_map, by = "region")
fires_joined <- fires_states %>%
select(region) %>%
group_by(region) %>%
summarise(num_fires = n()) %>%
right_join(state_map, by = "region")
`summarise()` ungrouping output (override with `.groups` argument)
Result!! Now doing first geo spatial visualisation
fires_joined %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fires)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_distiller(name = "Fire per Sq Mile", palette = "PuBuGn") +
theme_map() +
coord_map("mollweide") +
ggtitle("Total US Wildfires from 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
fires_states %>%
distinct(stat_cause_descr) %>%
arrange(-desc(stat_cause_descr))
fires_states %>%
select(stat_cause_descr) %>%
group_by(stat_cause_descr) %>%
summarise(num_fires = n ()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
NA
fires_states %>%
select(region) %>%
group_by(region) %>%
summarise(num_fires = n()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
# Function for plotting cause of fire
cause <- function(cause) {
fires_states %>%
filter(stat_cause_descr == cause) %>%
dplyr::select(region) %>%
group_by(region) %>%
summarise(num_fires = n ()) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fires)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_distiller(name = "Fire per Sq Mile", palette = "PuBuGn") +
theme_map() +
coord_map("mollweide") +
ggtitle(paste0("Total US Wildfires caused by ", cause, " from 1992-2015")) +
theme(plot.title = element_text(hjust = 0.5))
}
cause("Arson")
`summarise()` ungrouping output (override with `.groups` argument)
Arson does seem more prevalent in the SE states of Mississippi, Georgia, Alabama and also the western state of California.
cause("Campfire")
`summarise()` ungrouping output (override with `.groups` argument)
Campfires are the most prevalent in the Western states of Oregon, California and Arizona.
cause("Children")
`summarise()` ungrouping output (override with `.groups` argument)
Fires by children are spread about the country, but the most prevalent states are California in the West, Alabama and South Carolina and New Jersey in the east.
cause("Debris Burning")
`summarise()` ungrouping output (override with `.groups` argument)
Fires by burning debris are mostly in the southern warmer states of Texas, Georgia and North Carolina.
cause("Equipment Use")
`summarise()` ungrouping output (override with `.groups` argument)
Most of the fires caused by equipment seem to be in California
cause("Fireworks")
`summarise()` ungrouping output (override with `.groups` argument)
Most of the fires caused by fireworks seem to be in the north of the country. Primarily South Dakota, Montana and Washington state.
cause("Lightning")
`summarise()` ungrouping output (override with `.groups` argument)
Apart from a hotspot of lightning strikes in Florida, the vast majority of fires caused by lightning are in the West of the country. With the 3 most affected states being California, Oregon and Arizona.
cause("Miscellaneous")
`summarise()` ungrouping output (override with `.groups` argument)
There seems to be quite a few miscellaneous classifications in California, Texas and New York.
cause("Missing/Undefined")
`summarise()` ungrouping output (override with `.groups` argument)
The states with the most missing or undefined data is North and South Carolina, Oklahoma and California.
cause("Powerline")
`summarise()` ungrouping output (override with `.groups` argument)
Texas has the largest amount of wildfires caused by powerlines. This is possibly due to the warm climate and the large proportion of the state that is dry grasslands used for agriculture. (1)
cause("Railroad")
`summarise()` ungrouping output (override with `.groups` argument)
By far Florida has the most wildfires caused by railroads. I find this rather strange and will do some further investigation.
cause("Smoking")
`summarise()` ungrouping output (override with `.groups` argument)
Fires caused by smoking seem to be spread around the country, but mainly on the east and west coasts.
cause("Structure")
`summarise()` ungrouping output (override with `.groups` argument)
South Dakota has the largest proportion of fires caused by structures.
dataset package also has the area in square miles of each state included in the state.area vector.state.area
[1] 51609 589757 113909 53104 158693 104247 5009 2057 58560 58876
[11] 6450 83557 56400 36291 56290 82264 40395 48523 33215 10577
[21] 8257 58216 84068 47716 69686 147138 77227 110540 9304 7836
[31] 121666 49576 52586 70665 41222 69919 96981 45333 1214 31055
[41] 77047 42244 267339 84916 9609 40815 68192 24181 56154 97914
length(state.area)
[1] 50
(Area figures obtained from Wikipedia)
DC = 68 miles^2 PR = 3515 miles^2
# To make my life easier I'm going to remove the state.abb and .name files and make the tibble again, adding in the land area figures at the same time to make sure they are in the correct order.
rm(state.abb)
rm(state.name)
state.abb <- append(state.abb, c("DC", "PR"))
state.name <- append(state.name, c("District of Columbia", "Puerto Rico"))
state.area <- append(state.area, c("68", "3515"))
state_list <- tibble(state = state.abb, region = tolower(state.name), area = as.numeric(state.area))
# Re-joining tibbles
fires_states <- fires_small %>%
left_join(state_list, by = "state")
fires_states %>%
select(region, area) %>%
group_by(region, area) %>%
summarise(num_fires = n()) %>%
mutate(fires_sqmile = num_fires / area) %>%
arrange(desc(fires_sqmile))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
fires_states %>%
select(region, area) %>%
group_by(region, area) %>%
summarise(num_fires = n()) %>%
mutate(fires_sqmile = num_fires / area) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = fires_sqmile)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_distiller(name = "Fire per Sq Mile", palette = "PuBuGn") +
theme_map() +
coord_map("mollweide") +
ggtitle(paste0("Total US Wildfires per Square Mile from 1992-2015")) +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Puerto Rico is not shown on this map, but visually we can see the data for the other 51 entries, and the south eastern states still have the highest proportion of wildfires. Interestingly New Jersey also shows has a hotspot in the NE of the country.
fires_states %>%
select(stat_cause_descr, fire_year) %>%
group_by(fire_year, stat_cause_descr) %>%
filter(stat_cause_descr == "Arson" | stat_cause_descr == "Campfire" |
stat_cause_descr == "Children" | stat_cause_descr == "Equipment Use" |
stat_cause_descr == "Fireworks" | stat_cause_descr == "Smoking") %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_year, y = num_fires, colour = stat_cause_descr) +
geom_line() +
ggtitle("Amount of fires by cause, for years 1992-2015\n") +
xlab("\nYear") +
ylab("Number of Fires\n") +
labs(colour = "Cause of fire") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'fire_year' (override with `.groups` argument)
The 2 large peaks in Arson are obvious in 1999 and 2006. There was a large heatwave in 2006, but I’m not sure why this would result in an increase in arson. Perhaps there wasn’t an actual increase in the amount of arson, but it could have been due to the heatwave making the ground drier than normal creating extra fuel to aid the spread of fires that would have normally not resulted in a large scale fire. This may also be the same reason that there is also another peak in 2006 for Equipment Use. Arson however does look to be decreasing since 2006, as does children. Other wise most of the other causes of wildfires seem to be reasonably stable. Perhaps we shall try modelling arson and children relatd fires to see if we can identify a general trend.
fires_states %>%
select(stat_cause_descr, fire_year) %>%
group_by(fire_year, stat_cause_descr) %>%
filter(stat_cause_descr == "Debris Burning" | stat_cause_descr == "Lightning" |
stat_cause_descr == "Miscellaneous" | stat_cause_descr ==
"Missing/Undefined" | stat_cause_descr == "Powerline" |
stat_cause_descr == "Railroad" | stat_cause_descr == "Structure") %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_year, y = num_fires, colour = stat_cause_descr) +
geom_line() +
ggtitle("Amount of fires by cause, for years 1992-2015\n") +
xlab("\nYear") +
ylab("Number of Fires\n") +
labs(colour = "Cause of fire") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'fire_year' (override with `.groups` argument)
Similar peaks can be seen in Debris, Miscellaneous and lightning in the heatwave of 2006 that left the ground very dry. There are peaks from 1997 to 2003 in debris, miscellaneous and lightening, but also a trough in missing/undefined, so this is likely to be due to more accurate classification of fires and not using the missing/undefined category as much.
state_map_southern <- state_map %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana")
fires_states %>%
filter(fire_year == "1992" | fire_year == "1993" | fire_year == "1994" |
fire_year == "1995") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 1992-1995") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "1996" | fire_year == "1997" | fire_year == "1998" |
fire_year == "1999") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 1996-1999") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2000" | fire_year == "2001" | fire_year == "2002" |
fire_year == "2003") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2000-2003") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2004" | fire_year == "2005" | fire_year == "2006" |
fire_year == "2007") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2004-2007") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2008" | fire_year == "2009" | fire_year == "2010" |
fire_year == "2011") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2008-2011") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2012" | fire_year == "2013" | fire_year == "2014" |
fire_year == "2015") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2012-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
Looking at these trends some interesting insights can be seen. For the combined years data Florida stands out as having railroad as its main cause of wildfire, but from the above plots it can be seen that these railroad fires are only the main cause up to the 4 yearly period ending in 2003 and then the main cause changes to lightning until the end of the collection period in 2015. Similarly arson seem reasonably popular in the southern states until 2007, when it no longer appears as the most common cause of wildfire. This downward trend was also noted earlier in the overall causation plots for all states
fires_states %>%
dplyr::filter(region == "florida" & stat_cause_descr == "Railroad" &
fire_year == "1992") %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~longitude, lat = ~latitude, label =
~paste("Fire Size: ", fire_size, "/ Month: ", discovery_moy))
There is something not right with this data, it is extremely unlikely that all these fire are from railroads . So I’m going to check another random year in 1998.
fires_states %>%
dplyr::filter(region == "florida" & stat_cause_descr == "Railroad" &
fire_year == "1998") %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~longitude, lat = ~latitude, label =
~paste("Fire Size: ", fire_size, "/ Month: ", discovery_moy))
The same issue is present, there is defintiely something wrong here. Let’s check some more years.
fires_states %>%
dplyr::filter(region == "florida" & stat_cause_descr == "Railroad" &
fire_year == "2008") %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~longitude, lat = ~latitude, label =
~paste("Fire Size: ", fire_size, "/ Month: ", discovery_moy))
Now that is much better, with a lot less fires occurring and all of them next to a railroad as you would expect. Let’s check another year.
fires_states %>%
dplyr::filter(region == "florida" & stat_cause_descr == "Railroad" &
fire_year == "2015") %>%
leaflet() %>%
addTiles() %>%
addMarkers(lng = ~longitude, lat = ~latitude, label =
~paste("Fire Size: ", fire_size, "/ Month: ", discovery_moy))
2015’s data looks the same as 2008 and what I would expect for railroad fires, occurring next to rails and being few and far between!
Having used leaflet to plot all the Lat/long coordinates for the railroad fires in 1992, 1998, 2008 and 2015, I am doubtful of the validity of either the recorded classification of fire origin or the accuracy of the coordinates in the earlier years of the recording period. The majority of plotted fires have been started many miles away from any railroad. This is especially obvious for the 2 fires that occurred near Key West in 1992, which has no rail links anywhere near by only road. The coordinates have been given to 5 decimal places, which allows a precision of 1.1 meters, so unless they have been entered into the database incorrectly I think it is probably far more likely that an incorrect cause descriptor may have been entered for a lot of fires in the earlier years. This error seems to have been rectified in the later years of the dataset, as can be seen in the 2008 and 2015 data behaving as expected as above.
fires_states %>%
select(region, fire_size_class) %>%
group_by(region, fire_size_class) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = fire_size_class)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Fire Size Class", palette = "PuBuGn") +
ggtitle("Most common wildfire size per State 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
select(region, fire_size_class) %>%
filter(fire_size_class == "G") %>%
group_by(region) %>%
summarise(num_fire = n()) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fire)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_distiller(name = "Number of Fires", palette = "PuBuGn") +
ggtitle("Number of large class G fires per State 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
From the plots we can see that the Western states have the most small fires and also the most large fires! Not entirely the most helpful plots…
fires_states %>%
dplyr::select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Months with most fires per State 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "1992" | fire_year == "1993" | fire_year == "1994" |
fire_year == "1995") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 1992-1995") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
# theme(legend.position = "left")
fires_states %>%
filter(fire_year == "1996" | fire_year == "1997" | fire_year == "1998" |
fire_year == "1999") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 1996-1999") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2000" | fire_year == "2001" | fire_year == "2002" |
fire_year == "2003") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2000-2003") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2004" | fire_year == "2005" | fire_year == "2006" |
fire_year == "2007") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2004-2007") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2008" | fire_year == "2009" | fire_year == "2010" |
fire_year == "2011") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2008-2011") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2012" | fire_year == "2013" | fire_year == "2014" |
fire_year == "2015") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2012-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
The above plots are quite interesting. The months of the year that have the most seems to widely change in certain state. Mainly the east half of the country have the most fires in the Spring (Feb-May) and the western part of the country have the most fires later on in Summer and Fall (Jun-Oct). There are however a few exceptions that can be seen in the 2004-2007 and 2008-2011 data Texas has the most fires in January. Florida also mostly conformed to the East/West split with the majority of its worst months for fires taking place in March or April up until 2007, then the most common month moves later into June and July for the rest of the reporting period until 2015. This may have to due with main cause of fires in Florida changing from railroad to lightning related about the same time, as we noted earlier on when looking at causation. As July is the main month for tropical storms and lightning in Florida this is a possible cause for the highest month becoming later in the year than before. (2)